Skip to content

Fix pathologically slow assertion diffs for large inputs (#8998)#14543

Open
kirilklein wants to merge 1 commit into
pytest-dev:mainfrom
kirilklein:fix-8998-large-diff-perf
Open

Fix pathologically slow assertion diffs for large inputs (#8998)#14543
kirilklein wants to merge 1 commit into
pytest-dev:mainfrom
kirilklein:fix-8998-large-diff-perf

Conversation

@kirilklein

Copy link
Copy Markdown

Closes #8998.

Problem

Comparing very large strings, lists, or dataclasses inside an assert can hang for a long time (sometimes minutes) while pytest builds the failure diff.

Profiling the reproductions from the issue confirms the root cause is difflib.ndiff:

  • its character-level "fancy replace" step is quadratic in the size of the differing region (so two large, mostly-different strings are catastrophic), and
  • the underlying SequenceMatcher is quadratic in the number of lines — a large nested structure pretty-prints to a huge number of lines (the dataclass example in the issue pformats to ~418,000 lines).

Approach

Following the maintainer discussion in the issue, this uses a deterministic size heuristic rather than wall-clock timeouts (which are non-deterministic and can't reliably interrupt difflib).

A new helper module _pytest/assertion/_diff.py provides:

  • ndiff_too_slow(left_lines, right_lines)True when the combined input exceeds a character budget or a line-count budget, the two dimensions that make ndiff slow.
  • fast_unified_diff(...) — a coarse but fast line-level difflib.unified_diff, capped to a bounded number of lines so it always completes in milliseconds. It notes in the output that a faster diff is being shown (and how many lines were hidden).

Both pathological call sites fall back to it when needed:

  • compare_text._diff_text (string comparisons)
  • _compare_sequence._compare_eq_iterable (list / dataclass / iterable comparisons)

Comparisons below the cutoffs keep the existing detailed ndiff output unchanged.

Results

On the reproductions from the issue (dataclass with large lists + two large random strings), with -v:

  • before: hangs (one repro profiled at ~384s of find_longest_match)
  • after: ~0.7s, with a useful fallback diff

Tests

Added regression tests in testing/test_assertion.py: unit tests for the ndiff_too_slow heuristic, and integration tests that large string / many-line / large-iterable comparisons fall back to the fast diff (no ndiff ? guide lines), still show which lines differ, and emit the line-cap notice. Thresholds were chosen from benchmarking.

🤖 Generated with Claude Code

@psf-chronographer psf-chronographer Bot added the bot:chronographer:provided (automation) changelog entry is part of PR label Jun 1, 2026
@Pierre-Sassoulas

Copy link
Copy Markdown
Member

We have a flying MR to use generator in assert repr that could help with this when we don't have to show the actual output. (#14523)

…8998)

Comparing very large strings, lists, or dataclasses in an ``assert`` could
hang for a long time (sometimes minutes) while pytest built the failure diff.
The cost comes from ``difflib.ndiff``: its character-level "fancy replace"
step is quadratic in the size of the differing region, and the underlying
``SequenceMatcher`` is quadratic in the number of lines (a large nested
structure can pretty-print to hundreds of thousands of lines).

Add a deterministic size heuristic (no wall-clock timeouts, per the
maintainer discussion in the issue): when the input is too large for
``ndiff`` to be fast, fall back to a coarser line-level ``unified_diff``,
capped to a bounded number of lines so it always completes in milliseconds,
and note this in the output. Smaller comparisons keep the existing detailed
``ndiff`` output unchanged.
@kirilklein kirilklein force-pushed the fix-8998-large-diff-perf branch from c992d71 to e232573 Compare June 11, 2026 17:04
@kirilklein

Copy link
Copy Markdown
Author

Thanks! I looked at #14523. It and this PR are complementary:

  • Use streaming in all assertion comparisons consumers #14523 avoids computing the diff when it'll be truncated anyway (great for the default/-v case via pformat_cap), but its cap is None on CI and -vv, where ndiff's SequenceMatcher stays quadratic — and it doesn't touch the string path (compare_text._diff_text), which is the original repro in this issue.
  • This PR caps the diff input deterministically regardless of verbosity/CI and covers both strings and iterables, so the pathological hang can't happen even when the full output is shown.

They do overlap in _compare_eq_iterable. Happy to rebase on top of #14523 once it lands, or to narrow this PR to just the cases #14523 doesn't cover (the string path + CI/-vv) — whichever you prefer.

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this make sense, ndiff is really costly and if they're a ton of changes no one is going to look at everything in great details. Maybe we can make some lines fancy and not show everything instead of showing all the lines as non fancy though. Or making only the first line fancy because -vvv means show me the full diff after all.

Comment on lines +26 to +28
size = sum(len(line) for line in left_lines) + sum(
len(line) for line in right_lines
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're summing everything here, we need to fast exit as soon as size become greater than NDIFF_MAX_INPUT_SIZE

Comment on lines +48 to +51
yield (
f"Diff too large to compute in full (over {NDIFF_MAX_INPUT_SIZE} "
"characters); showing a faster line-level diff instead:"
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message is wrong here, could be either too many line or too many chars.

Comment on lines +80 to +81
left_lines = left.splitlines(keepends)
right_lines = right.splitlines(keepends)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to split lines ? Can't we just count the line separator ?

Comment thread testing/test_assertion.py
assert ndiff_too_slow(["spam"], ["eggs"]) is False

def test_many_characters_is_too_slow(self) -> None:
assert ndiff_too_slow(["a" * 6000], ["b" * 6000]) is True

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's mock the values, we don't have to actually construct an enormous list to test the behavior

Comment thread testing/test_assertion.py
assert "- " + "a" * 50 + "eggs" in lines
assert "+ " + "a" * 50 + "spam" in lines

def test_text_diff_large_input_skips_ndiff(self) -> None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also mock here

@Pierre-Sassoulas Pierre-Sassoulas added the type: performance performance or memory problem/improvement label Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot:chronographer:provided (automation) changelog entry is part of PR type: performance performance or memory problem/improvement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

assert str1 == str2 takes forever with long strings that differ by a short prefix

2 participants